Skip to content

torchao: safetensors save/load + disk group offload (closes #13713)#13721

Open
itzzdeep wants to merge 1 commit into
huggingface:mainfrom
itzzdeep:torchao-safetensors
Open

torchao: safetensors save/load + disk group offload (closes #13713)#13721
itzzdeep wants to merge 1 commit into
huggingface:mainfrom
itzzdeep:torchao-safetensors

Conversation

@itzzdeep
Copy link
Copy Markdown

What does this PR do?

Fixes #13713

Description

Adds safetensors save/load and disk-based group offloading for TorchAO-quantized models. Implemented via 4 no-op hooks on DiffusersQuantizer, overridden by TorchAoHfQuantizer:

  • get_state_dict_and_metadata — flatten subclasses on save
  • set_metadata — read torchao header from each shard on load
  • update_loaded_keys — collapse _weight_* suffixes to canonical names
  • update_state_dict_with_metadata — per-shard unflatten with cross-shard buffer

Group offload reuses the same path. No torchao-specific code in modeling_utils. Requires torchao ≥ 0.15.0 and version=2 configs; v1 raises a ValueError naming the fix.

Results

End-to-end verified on Lumina-Image-2.0 (2.61B params, A100, Int8 weight-only).

GPU memory Baseline Disk group offload
Peak during forward 2.53 GB 0.16 GB
Resident between forwards 2.47 GB 0.009 GB

Save → reload is bit-perfect (cosine_dist = 0.0). Forward latency cost: 87 ms → 1592 ms.

Tests

TorchAoSerializationTest 7/7 + test_group_offloading.py 38/38 pass. Real Flux.1-Dev save/reload works; disk group offload on Flux is blocked by pre-existing torchao bugs

[x] Did you write any new necessary tests?
Added test_safetensors_save_int_a16w8 and test_group_offload_to_disk_int_a16w8 in tests/quantization/torchao/test_torchao.py

Who can review?

Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.

@sayakpaul

@sayakpaul
Copy link
Copy Markdown
Member

Thanks for your contributions. However, #13719 was opened before this one, so we will go with #13719. I am sure there will be future contribution opportunities.

Copy link
Copy Markdown

@wadeKeith wadeKeith left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Solid improvement - safetensors save/load plus disk group offload for torchao. Properly closes #13713. Good test coverage in the quantizer module. LGTM! Reviewed by Hermes Agent.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Incorporate safetensors support to TorchAO

3 participants